Skip to content

[CI/Build] Bump flashinfer to v0.6.10#41711

Open
arpera wants to merge 5 commits into
vllm-project:mainfrom
arpera:bump-flashinfer-0.6.10
Open

[CI/Build] Bump flashinfer to v0.6.10#41711
arpera wants to merge 5 commits into
vllm-project:mainfrom
arpera:bump-flashinfer-0.6.10

Conversation

@arpera
Copy link
Copy Markdown
Contributor

@arpera arpera commented May 5, 2026

Purpose

  • Bump FlashInfer from v0.6.8.post1 to v0.6.10.
  • Adjust installation to use flashinfer-python[cu13] extra for cu13 users.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

Hi @arpera, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the FlashInfer version to 0.6.10 across the project's Docker configurations and dependency files. It also introduces conditional logic in the Dockerfile and setup.py to include the [cu13] extra for flashinfer-python when CUDA 13 is detected, facilitating support for SM100 GDN kernels. I have no feedback to provide.

@pavanimajety pavanimajety added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label May 5, 2026
@pavanimajety
Copy link
Copy Markdown
Collaborator

FYI: 0.6.9 update - #40998

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 5, 2026

Yes, I have seen this PR #40998, thanks. It wasn't finished, so I think now v0.6.10 makes more sense.

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 5, 2026

I would also like to point out that in this PR, in addition to directly integrating the new FI version v0.6.8, I made a small fix that wasn't accounted for in vLLM when integrating previous FI versions.
Specifically, I added installation of the flashinfer-python[cu13] extras in cases where the user has cu13 installed. Right now this is necessary because without the extras, Flashinfer does not install nvidia-cutlass-dsl[cu13] extras by default, which is required in particular for using the FI Blackwell GDN implementation whose support I'm currently trying to add here: #40717.

There is also a small discussion about this issue in comments: 1, 2.

Since I don't have much experience managing build dependencies in vLLM, I'd be happy to get suggestions for a more correct way to handle this in vLLM.

@wzhao18
Copy link
Copy Markdown
Contributor

wzhao18 commented May 5, 2026

I am noticing some potential numeric issues with the newer flashinfer versions. Specifically, the generation length for GPQA with DSv4 with the new versions are significantly longer than before (claude suggests the model is stuck in self-doubt loop).

I am still investigating the issue. But just wanted to flag this out. It may be worth doing some more eval studies before merging this.

@pavanimajety pavanimajety removed the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label May 5, 2026
@vadiklyutiy
Copy link
Copy Markdown
Collaborator

Do I understand correctly that if have environment with cu13 and do pip install flashinfer-python it doesn't install everything and users have to additionally make pip install flashinfer-python[cu13]?

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 6, 2026

Yes, you understand right

@wzhao18
Copy link
Copy Markdown
Contributor

wzhao18 commented May 6, 2026

@arpera With more investigation, I think the issue that I was hitting was not related to newer flashinfer versions (but with something else). I tested v0.6.10 GPQA eval with deepseek v4, it looks good. I have no more concern for upgrading.

@aleozlx
Copy link
Copy Markdown

aleozlx commented May 7, 2026

side note: we released 0.6.10.post1 not long ago for fixing an allreduce hang about missing rendezvous sync group
https://github.com/flashinfer-ai/flashinfer/commits/v0.6.10.post1/

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 7, 2026

@aleozlx, that is good to know, thank you!
I also see from failed CI jobs here that there are some jobs that has huge logs ~7GBs which is really odd. The problem is probably due to use of TLLM_LOG_LEVEL=debug by some of the FI's kernels in v0.6.10. I haven't yet tried to reproduce it locally, just some pre-analysis based on logs.
Could you please check whether some of FI's kernels were compiled with this env var set to degug?

@johnnynunez
Copy link
Copy Markdown
Contributor

0.6.11 is out cc @mgoin

aleozlx pushed a commit to flashinfer-ai/flashinfer that referenced this pull request May 11, 2026
## 📌 Description

`gen_jit_spec` adds `-DNDEBUG` only to `extra_cuda_cflags` (consumed by
`nvcc` for `.cu` files), not to `extra_cflags` (consumed by `g++` for
host-side `.cpp`). Several host-only translation units are part of
MoE/GEMM JIT specs — most notably
`csrc/nv_internal/cpp/common/logger.cpp` — and they end up compiled
without `NDEBUG` while the rest of the module is a release build.

For the TensorRT-LLM logger this matters because of:

```cpp
// csrc/nv_internal/include/tensorrt_llm/common/logger.h
#ifndef NDEBUG
  Level const DEFAULT_LOG_LEVEL = DEBUG;
#else
  Level const DEFAULT_LOG_LEVEL = INFO;
#endif
```

With `NDEBUG` missing on the host side, every prebuilt
`flashinfer-jit-cache` wheel ships with `Logger::level_ = DEBUG (10)`.
On Hopper this turns each MoE forward pass into a stream of
`[TensorRT-LLM][DEBUG] ... sm90_generic_mixed_moe_gemm_kernelLauncher
...` lines from the OSS CUTLASS kernel dispatcher. Verified by reading
the data-section initializer of `Logger::Logger()` in the released
`flashinfer-jit-cache==0.6.10+cu130`
`fused_moe_{90,100,103,120,trtllm_sm100}.so` — all five start `Logger`
with `DEFAULT_LOG_LEVEL=10` and `level_=10`, even though the same wheels
carry no `.debug_*` sections (i.e. they are otherwise release-built).

The fix is one line: also append `-DNDEBUG` to the host `cflags` when
not in debug mode. The `flashinfer-jit-cache` wheel build picks this up
automatically and the prebuilt logger flips back to `INFO`.

## 🔍 Related Issues

Initially this bug was observed during integration of FI v0.6.10 into
vLLM: [[CI/Build] Bump flashinfer to v0.6.10
#41711](vllm-project/vllm#41711).
There is a CI job log failure due to this issue:
[buildkite/ci/pr/distributed-tests-2-gpus-h100](https://buildkite.com/vllm/ci/builds/64532#019df966-e67d-4c27-af0e-76b00bc496e5).

Surfaced while debugging a downstream CI step that produced a 2.9 GB log
dominated by TRT-LLM debug prints from `fused_moe_90.so`. No FlashInfer
issue tracking this yet — happy to file one alongside this PR if useful.

## 🚀 Pull Request Checklist

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`.
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`pytest tests/test_jit_cpp_ext.py`).

Two regression tests added in `tests/test_jit_cpp_ext.py`, mirroring the
existing `test_debug_jit_uses_sccache_compatible_nvcc_device_debug_flag`
style:

```
pytest tests/test_jit_cpp_ext.py -v
```

```
test_release_jit_propagates_ndebug_to_host_cflags PASSED
test_debug_jit_does_not_propagate_ndebug          PASSED
```

The first asserts that a release build
(`FLASHINFER_JIT_DEBUG`/`FLASHINFER_JIT_VERBOSE` unset) puts `-DNDEBUG`
in **both** `spec.extra_cflags` and `spec.extra_cuda_cflags`. The second
locks in symmetry: with `FLASHINFER_JIT_DEBUG=1` neither list contains
`-DNDEBUG`. Without the fix, the first test fails on `assert "-DNDEBUG"
in spec.extra_cflags`.

## Reviewer Notes

Single-line behavior change in `flashinfer/jit/core.py`. No effect on
debug builds. Prebuilt wheels rebuilt from this commit will pick up the
change automatically — no schema/version bump needed.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* JIT-compiled code now includes optimized compilation flags in release
mode for improved performance.

* **Tests**
* Added test coverage for proper compilation flag handling between debug
and release build modes.

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/flashinfer-ai/flashinfer/pull/3278)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 11, 2026

Current up-to-date FI's version (v0.6.11) has a problem that some of the kernels were compiled with DEFAULT_LOG_LEVEL=DEBUG. Due to this problem some of the tests in our CI run failed, for example, buildkite/ci/pr/gpqa-eval-gpt-oss-h100. I managed to fix that issue of FI's side and merged this fix into upstream. fix(jit): propagate -DNDEBUG to host-side cflags#3278.

Then I have a question that I would like to ask you. As I understand we have now two options:

  1. Wait until next FI release to have this patch merged
  2. Make some temporary solution in vllm to override DEFAULT_LOG_LEVEL from DEBUG to INFO

What do you think about these options? Should we wait until next release or we should do a temporary solution in vllm and remove it later?

@wzhao18
Copy link
Copy Markdown
Contributor

wzhao18 commented May 11, 2026

@arpera Can you request for a patched release 0.6.11.post1 with the fix you just merged? Should be a reasonable thing to ask.

also, is the failure caused by the log level or some other issues that were hard to check due to the verbose log level?

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 11, 2026

Can you request for a patched release 0.6.11.post1 with the fix you just merged?

Thanks for suggestion! I will ask FI's team to do this.

also, is the failure caused by the log level or some other issues that were hard to check due to the verbose log level?

It was hard to check if there is other issues besides that one because logs were about several GBs each. I see that some of failed CI jobs are flaky and failed not due to update. Nevertheless there is still a possibility that FI's version update caused something else in our CI.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@arpera arpera requested a review from Harry-Chen as a code owner May 11, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

6 participants